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and (4) observational style. Several studies were conducted using the 
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ABSTRACT 



^ One approach to increasing our understanding of the rating ' 
process is to examine behavioral components ' of decision-making. 
Although observable rater behavior during appraisal is'still removed 
^ from the actual contents of internal 'processing , these behavioral 
indices Aay provide important clues toward identifying determ^^nant^- 
of rating success. A methodology called Instantaneous Report of 
* Judgments (IRJ) was developed to measure rater behavior during 
^appraisal. Four rating behaviors were examined which are believed to 
reflect important dimensions of rating ability: ' amount of ' 
information utilized, sensitivity to differences between ratees 
sensitivity to- ratee strengths ^nd spaknes.ses, and observational . 
style. Two sets of studies were conducted using IRJ. 'ihe first set 
consisted of basic descpiptiye studies of fa^er behavior during the 
rating process wi^th the goal oP, identifying stable components of 
rating style. The second set involved construct valijlatiort of the 
IRJ procedure and rating data. Results from these studies' are 
•presented and discussed briefly. It is concluded that IRJ can ' 
provide reliable and valid' data apd that these behavioral indices 
Shed some light on the underlying mechanisms of accuracy. 
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Behavioral/ Indices of Raters* Cognitive Processing 



in Per formanfce .Appraisal 



Performance rat^ research is "in a state of transition. Instead 
of searching for ways of improving the mechanics of /Appraisal (e.g., 
better rating forms, more time allotted to the task, more effective 
rater training), researchers recommend investigating the processes 
underlying performance rating (Feldmdn, 1981; IJ^en & Feldman, in ^ 
press; Landy & Farr ,S 1980) . While many haye stressed the importance 
of this kind of research for' several years' (e.g. , Borman, 1979, 
Dunnette & Borman, 19.79), few Itudies hav^ been completed. ^One reason 
for this delay is the absence of easy methodologies for -studying 
process variab-les for psychologists in general and P/O psychologists ^ 
in particular. Thus, many I/O- psychologists who are interested in 
rating prQc^ss research 'have borrowed paradigms -outside • I/O an,d 
adapted them to the appraisal context. This paper describes a . ^ 
methodology adapted from !lgnitive psychol'oy for analyzing the rating' 
.process.' This methodology, called Instantaneous Report of Judgments'^ 
(IRJ), ^yields. behaviofaZ indices of. raters' cognitive processing in 
performance appraisal . ^ * ^ . 



Background and Rationale 

Why examine behavioral components df the rating process? Two 
reasons come to mind. First, it is generaky acc/pted that previous 
attempts to increa-se accuracy by examining t(he relationship between 

I' 
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input variables (i".e'.\ training, experience, interpersonal accuracy 

correlates) and appraiser outcomes (i.e., errors) have yielded ♦ 

disappointing results. To help explain the failure .of ' previous 

studies, must look deeper into the relationship beween input and- 

outcome variables. Why don't rating fcrrmats aid, raters' 
% 

^ decislon-^^ing? Why have training programs failed to improve 

substantially rater accuracy? A description of rater behavior (e.g. " 

* ♦ 

processing of information) during appraisal will help determine how 
•various input variables affect rating process and hence, outcomes. 
Second, several interesting research questions may be answered by 
analyzing, rater behavior at the micro level s'uch asj .Does sex or race 
bias enter into performance 'bating ^^.the selectio^n or ^valuation 
stage of pirocessing? Do raters s-earch for disconf irraator;r information 
once a judgment is fo^rmed? Do raters fitilize the same information to 
generate performance ratings for several dimensions? In sum, 
knowledge of rateir behavior during the^rating' process may suggest what 
to^ change behaviorally to increase rater accuracy. 

In this paper, I will ^escribe IRJ and present findings from 
studies employing the IRJ procedure. I hope to 'show that important 
new information about ^the rating process* can be obtained >:hrough this 
methodology and that insights into the determinants of rating accuracy 
are likeLy^by following this approach. First, let me be clear about 
what is meant by the term, rating process, and-what' constructs I 
intend to measure through IRJ. 

The rating process is conceptualized as a fivQ.-step 
information-processing sequence tkat results in an over.aH performance 



rating for a particular performance dimension. The steps consist of 
internalizing task requirements, selecting relevant information, 
evaluating selected information, storing and recalling stored 
•information, and combining evaluations (See Banks, 1981 for more 
detail). It is important to note in this conceptualization that the 
task as given may not bg identical to its interpretation and that 
information search and selection is a central component of the"" 
process: Both of these aspects of the conceptualization are" important 
because they allow for individual deferences in the selection and 
interpretation of ratee data, a consideration that is downplayed or 
ignored in other investigations of the rating process (e.g., 
policy-capturing). A methodology jfa desired that measures these ' 
individual differences explicitly, because , it is feelieved that these 
individual differences will play a key role in unraveling the mystery 

f 

•o£ accurate rating*' • ^ 

Based on this conceptualization of the rating process and on folk 
knowledge of the secrets of successful rating in the literature four 
constructs were hypothesized to comprise rating ability:"! (1) degree 
of information utilization;., (2) sensitivity to differences between 

} * 

ratees; (3) sensitivity to ratee strengths and weaknesses; and (4) 
global vs. specific observational style. These constructs are 
^escribed below. ^ - ^ 

1.- Degree of. information utilization^ This construct* is defined 
as tTie amount of information a rater utilizes during a .rating task. 
Utilisation of information isj considered important because the 
literature suggests ^hat the mWinformation a rater uses, -the higher 



. the 'probability job-related information will influence evaluation 
'(Schm^tt, 1976).- • * 

2. Sensitivity ta ratee differences. This 'construct is similA: 
td one of Cronbach's components of judgmental accuracy, dif fer'ential 
■elevation (DE; Cronbach, 1955). * This construct reflects a r;kter'p 
ability to detect differences between ratees when differences ac^l^ly 
exist. The higher the variance i^n^erfonnanc^ -ratings across ratees, 
the more differences a rater detects. The literature claims that a • 
lack of differentiation, or restriction of range, leads to lower 
accuracy (cf. Carroll & Schneier, 1^82). While this literature is 
based on summary or o>?era'll ratings rendered for a ratee, sensitivity 
to ratee differences could be extended to the level of ipdividual 
judgments which cbm^ose summary ratings. - ' 

^ '3. Sensitivity to ratee strengths and wealaiesses. This 

'construct attempts to capture a rater '9 ability to evaluate 
ft 

performance in an even-handed or balanced manner. Within a y 
performance dimension, a lack of sensitivity has been characterized as 
•a failure to seek or recognize di-sconf irmatory ratee information after 
an impression is established /Snyder & Swann, J978). A. confirmatory 
strategy , one* in which a rater seeks information consistent with his 
or her impression, may result in low variability in^information 
utilized, and hence) failure to utilize all relevant ^ information if. 
both positive and negative information are present. 

4. Global vs. specific observational style. This construct 
attempts to capture the Jcind of information a rater processes during 
appraisal. ^^'Globar' processors may be characterized as those who 



develop global impressions of the ratee by processing information at* a 

more abstract level than "specific" processors. Global processors ao 

not develop impressions on tt(e basis of specific behavioral eyents; 

rather, they form impressions by generalij^ing across' ratee behaviors 

forming abstractions from the belyivioral data. For example, a global 

processor may attend to the^ratee's attitude across all incidents that 

involve S conflict with a subordinate. In this ca^e, the rat^r njay be 

evaluating a performance dimension^ different from the one explicitly 

stated on the rating form (e.g., "attitude" vs. "ability to resolve ^ 

•* * 

conflict"). It is believed that specific processors, on the other 

^ \ 
hand, generate summary ratings by combining separate and specific bits 

gf information and avoid generalization across incidents. This latter 
style may reduce the probability a few salient events will sw^njp 
subsequent jud^ents (Schraitt, 1976). „ 

Each ^construct was operationalized by an observable ratrng 
behav.iof emitted during the rating process. Cbnstructs and associated 
rating behaviors are listed in Figure 1^ Notice that 'the amount and 
kind >of intormation utilized requires that otie know the number and 
content of judgments ^de by a rater. Instantaneous Report of 
Judgments (IRJ) was developed so that raters could describe their 
judgments when they occured during a rating task. A description of 
IRJ artd how these constructs, were measured follows. 



Insert Figure 1 about here 
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. Instantaneous Report\of Judgmeots ♦ ^ 

Instantanedus Repqrt^ of Judgments or IRJ was\based for the most 
PArt on inforaation processing theory as presented art cognitive 
psychology (Eficssoiv & . Simon, 1980; Newell & Simon, 1972)-. Briefly, £ 
rater reports his. or her judgments formed during a rating task by 
using a panel of buttons to record judgments of ratee performance and 
by reporting verbally behavioral cues that ."trigger" judgments. (See 
Banks, 1980 & .1981 for mo're detail. > Basically, IRJ provides raters a 
mechanism for reporting the. contents of their "decision-making whenever 
-'they feel the "urge" to report. 
. The four behavioral indices of raters' cognitive processing 
(dumber of judgments, variation in jiidgments, variation in mean 
judgments, and latency) are obtained in the following way. Each 
button. press on the panel signals a judgment was "made; therefore, .the 
number of button presses indicates the numfcer of judgments made (NJ). 

• t 

Since button values duplicate the point values on the rating sCale, 
the particular button pressed indicates the judged level of ratee 
performance. Variation in judgme'iits* is obtained by the standard 
deviation of the values of buttons pressed (SDJ). Whe^ these values 
are averaged yielding a mean judgment level per ratee, variation in 
mean judgments is obtained by calculating the standard deviation of 
^mean judgments across ratees (SDj). A timing device which ties button 
presses to on-going ratee behavior allows measurement of latency 
(LAT). It also ties^judgments to tatee cues, permitting 
identification of information utilized by a rater in forming a 



judgment. Thus, IRJ allows measurement' of ^four rating behaviors, plus 

identification of cues selected'and processed during the rating task. 

Th6Se operations, in 1:urn/ allow 'Measurement of the constructs 

believed to be related to rating ability,. >x 

Raters in IRJ studies individually view videotaped perfoi 

of managerial behavior (5-7 minutes long). Videotaped wer.e- previoA«iy 
^ developed by Borman and his associates (Borman, Hough, & Dunnette/, 

1976). Raters view and rate a single performance dimension for each 
manager. In other words, one manager is evaluated on one dimension 
y/ per viewing, and this constitutes a dingle rating task. In each 
rating task, rateTrS press ^ button whenever they "fee'l" they are 
making a judgment, and they press the button (1-7) that best 
represents their judgment, of ratee performance. After pressing a 
button, they report verbally the basis for th^eir judgment. For every 
task, raters are^ encouraged to^ press buttons as many times as thc^y 
make judgments and at the" conclusion of each task^ they render a 
summary grating. In all^ six ratees are rated along each of six . ' 
performance dimensions. 

Research Findings - / ' 

Several studies have been conducted using IRJ, and these are 
outlined in Figure 2. 'These studies can be divided into two 



Insert Figure 2 about here 



groups: descriptive studies of the rating process and construct 
validation of IRJ. The descriptive studies were designed to collect 
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basic information about rater behavior during a rating task. They ' 
sought to determine how much information raters utilize" and what 
information is utilized, and to determine the p^esenc^ (absence) of a 
general rating style. Details of these studies can be found in Banks 
(1981; 1982). Findiftgs of thesa. studies will be sunmarized briefly 
below. ' 

In terms of judgments made, a rater, makes about seven judgments 
per ratee, though large individual differences exist in the number. A 
rater tends to make judgments early in the evaluation period (within 2 
minutes), and the range of judgments made for each ratee is relatively 
small (within 1 to 2 points on a 7-point scale). A rater also does 
not differentiate greatly across ratees; the range o^mean jtldgments 
is about 1.5 points. When rating behavior is observed across task^ , 
marked' similarities in rating behavior were fourtd. This suggests^ that 
a rater tends to utilize a consistent rating style across tasks. For 
example, raters appeared to be consistent regarding number of 
judgments made (NJ) and judgment latency (LAT), but yfl'fiation in 
judgments (SDJ) was less consistent (meciian internal consistency 
reliabilities = .95, .77, and .6L, respectively). An interesting 
finding emerged when SDJ was examined across' tasks. This is» raters • 
seemed to narrow their range of judgments with practice. It is' not 
clear whether this narrowing of judgments was the result of becoming 
more skilled over time or whether experience with the task changed 
their reporting. / , 

When cue selection and evaluation was examined, it was found that - 
untrained raters do not tend to select the same inforjnation when they 
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evaluate the same ratee along the same performance dimensioa. 
Moreover, even ^^en raters selected the same information, they 
evaluated it differently. These latter findings suggest that 
untrained raters (college students) differ substantially in the 
factors affecting information or cue selection (e.g., interpretation 
of task requirements, motivation, attention) and cue evaluation (e.g., 
rating criteria, cognitive schema, preconceived notions of ratee 
performance)^ Simply providing weU-developed rating formats like 
BARS and removing conflicting motives (e.g., eliminating 
responsibility for the ratings) is not sufficient to guide the rating 
process to the same end. y 

The second set of studies sought to determine the meaningf ulness 
of these rati-ng behaviors. First, a rate-rerate reliability . study was 
conducted to determine if these rating behaviors wete repeatable when 
identical tasks were adifrinistered I ^ 5 months later. For- a subset of 

16 raters, mean judgments calculated for Time 1 and Time 2 tasks were 

> 

highly correlated as were overall performance ratings (median r's » 

.83, respectively). These findings suggest that a rater arrived' 
at the same outcome at both administrations. For NJ, LAT, and 
.^specially SDJ, reliability was lower (median r's » .54, .49, and 
-.05, respectively). A rater tended to press a different number of 
buttons (usually fewer) and pressed a smaller range of buttons upon 
the second viewing. As with, internal consistency analyses, this 
analysis suggests^some ^vision in rating behavior with practice; thus 
lowering reliability estimates. But-, one could argue that the rating 



tasks wer^ no longer identical since a rater possessed 
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information about .the ra.tee. in the secand viewing than^ the first. 
This" would result* in artificially low estimates of reliability. 
Overall reliability 'anal^T^suggest that although judgments differ in 
quantity and range over time, they combine to form the same • ' 
■conclusion, a finding that argues ag^nst the po^bility that raters 
responded randomly. ^ " . ^ 

, Generalizabilityof IRJ findings 'was assessed in part T)y ' 
comparing raanag^ers' with students' rating be^viors in identical 
rating tasks. 'Both managers and students rated each of the six.ratees 
along each performance dimension in a total of six rating sessions. 
Managers and stuaenta; were, compared in terms of rating behavior (NJ, 
SDJ, SDJ, and LAT), and rating outcomes ' (accuracy , h^lo, leniency, and 
Restriction of range).' Various person perception variables shown to 
be related lo'rating success (Boraan, 1979) were also compared, ^earts 
^Od standard deviations of rating behavidts; rating outcomes, and 
person perception variables are shown in Table 1. 



Insert Table 1 about. here 
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No significant difference's were found between the two groups except ^ 
for age and cognitive complexity (students were younger, but smarter). 
. Although managers and students do not differ significantly on thes^ 
variables, some important pattern di^erences in the relationships, 
between variables were evident^ ""IjPatitera differences will be 
elaborated on in a .late^ section. For the moment, let" us examine each 
variable singly. In general, the behavior of managers and students 5* 
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the r^ing tasks was quite similar, suggesting that «^ would not 
expect managers in general to respond very differently when given the 
same tasks as students 1^ ^' , 

Another study was conducted to determine wheth'er reporting 
^ratera' contents of thefir decision-making altered rating outcomes. I; 
.so-, this would limit the generalizability of IRJ findings to typical 
rating tasks. Mean performance ratings were calculated across raters 
from The Banks (1979) sample for, each ratee on each dimension. These 
mean ratings were correlated vith mean performance ratings .collected 
by Borman (1979). Borman's ratings were obtained by having raters 
simply view the same videotapes and record sunmary performance 
ratings. Despite differences in procedure, samples, and rating 
instructions, ratings from the two studies correlated .90 (p<.01) and 
the sum of the differences betweeti the two grou'ps of means was near 
zero (d. = .3). Similar correlations with the Borman ratings were 
found with mean ratings from a later IRJ study (r - .91, p<.01 ). 
^Recently, I colle<«:ed ratings from an independent sample of student 
raters (N - 37) using the Borman procedure, and again the correlations 
between- th^se ratings and ratings from the IRJ studies were high (r's 
- .94 & .96). In general, it can be concluded tjiat findings from IRJ 
studies can probably be generalized to rating tasks typically 
encountered in^prai^al research. More important, the IRJ procedure 
does not seem to' interfere greatly with the fating process." 

Next, rating behaviors collected using IRJ were correlated with 
rating outcomes (accuracy, halo, leniency, and restriction of range) 
to determine which rating behaviors were associated with accuracy and 
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^ ^ rating error. Data from managers and students were examined' 

separately^ Originally, the manager sample was separated into two 
subgroups, expert and nonexpert raters. Experts were differentiated 
, ^ from nonexperts 'on the bq|^ of . textbook'^type knowledge of appraisal 
and on the basis of rating^ experience as judged by the author after 
in-depth interviews » These subgroups were combined when no 
significant differences in rating behaviors or rating outcomes were 

d. (So much for armchair criterion analyses.) Relationships 
between rating behaviors and rating outcomes for each group are shown 



in Table 2. It can be seen that restriction of range 

/ 
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error is consistently related to AVGSD,ithe variation in mean ' 
judgments (SDJ). averaged across ratees. AVGSD is the micro-level 
analog of restriction of range since both are computed in terms of 
differentiation between ratees. A high correlation between the two 
measures means that differentiation (or lackof it) at the judgment 
level is consistent with differentiation at the suranary rating level. 
For the Onager sample, leniency was related to AVGNJ, the number of 
judgments fc(NJ) averaged acroas tasks, and AVGSD, the average variation 

in mean judgments (SDJ). Jhis suggests that the more judgments a 

/ 

rat.er makes and the- greater the differentiation between ratees, the 
lower the leniency. However, these correlations were not found in the 
student sample. , * ' * 

The moat interesting aspect ^1 Table 2 is the relationship 
between rating behaviors and accuracy. Accuracy was measured by the 
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. correlation between each' rater's set of 36 sunnary ratings and 

Bonnan's mean expert ratings (Borman, 1979). For the student sample 
only, accuracy was related to AVGNJ, AVGSDJ, and AVGLAT, aggregate 
•cores .of NJ, SDJ, and UT averaged across taslc^. Xhe^e correlation* 
•howed that accurate raters tend to make fewer^ judgments, exhibit less 
variation in judgments, and take more time' genejrating the first 
juog»ii*n:han l^ss accurate raters. 

. AX first gUnce, correlations b^ween rating behavior «nd 
accurancy ^pear to- contraaict 'expectations set forth eartle^-nr-this. 
paper. Recall that it was hypothesized thaf rating accuracy v/ou-ld be 
aaociated witW^ hi^h NJ, ^Igh SDJ, and low LAT, according to the 
performanS* apjrai'sal literature. In the student sample, the opposite 
seemed^ to be true; raters who made few judgments and exhibited longer 
latencies of responding tended to be more accurate. This potential 
inconsistency can be explained by exploring the process by which 
judgments are produced. > 
Early responders could be responding appropriately or 
^ inappropriately depending on the cues responded to. If cues relevant 
to the evaluation of a particular performance dimension are present 
early in the ratee performance, a quick response would be expected and 
appropriate. However, if the raters responds early to irrelevant cues 
(a sign of failure to discriminate cues), then early responding would 
be inappropriate. It may also be the case that even if 'relevant cues 
are responded to early in the process, the rater maV fail to report 
judgments unti?l a sufficient amount of confirmatory (or 

4 

disconfirmatory) evidence has accumulated to build confidence in the 
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judgment. Since raters ia the student sample were relatively 
inexperienced in giving performance appraisals, it would sefem , 
reasonable to hypothesize tfrat their inexperience led to cautious (and 
therefore delayed) reporting (or accurate raters and more spontaneous 
reporting for less accurate raters. 

^ %. w 

The_ lack of correlations for the manager samp.le also seems ^ '. 
•disturbing, but this too can be explained. The scatterplot oi the • 
relationship betwe-en accuracy and each' rating behavior revealed 
moderate curvilinear relationships. For Nj, accurate raters tended to 
make either a high' number of judgments or*a lo«/ number whereas less 
accurate ratef^ made about an average number" (eta - .39').^ Sincfe the 
eta coefficient (.39) is higher than the Pearson ^efficient (r>.01) 
between NJ and accuracy, we can co^^^ude that a /on linear relationship 
doe. indeed exist. These data suggest that acccurate raters tn the 
•manager sample exhibit one of two styles of -responding: eai;ly 
respon;£s who have the experience and 'confidence to identify and 
report relevant cues and late responders who wait for evidence to 
accumulate before reporting. Thus, accurate managers' may be 
characterized as exhibiting one of two styles of rating whereas 
accjirate students exhibit only one. Students may not be sophisticated 
enough in appraisal to have developed a fine-tuned cognitive schema 
for interpreting performance-related cues confidently or for 
recognizing subtle behavioral cues. The "gist" of the cuea may be 
obvious to the. students in the aggregate, but* taken singly, cues may 
not be Interpreted ad well a» they would by managers. At this point, 
ehii exifUmtion tor the re«e«rch fj^iiings should be regarded as ' 



speculative. ^ * * 

.Finally, per.8on perception variables were correlated with ratine 

« 

behavior and outcome measures, tkhse correlations a^ found in Table 
3-. As in Borman's (1979) study, intellectual fact;ors were correlated 
.with accuracy, but for managers only. (Correlations in the student 
sample may have been artificially low due to restriction of range in 
intellectual ability.) When- rating behaviors were correlated with 
person perception variables,. a different pattern of correlations were 
found for managers and students. For managers, many person perception 
variables . ' - 



Insert Table '3 about here 



were related to AVGNJ and AVGLAT whereas for students, f^w • 
correlations were found.. This suggests that managers' racing behavior 
(NJ and LAT) was affected' by person perception variables mo^^n 
they were for students. A more interesting finding is that ^though 
AVGNJ and AVGLAT were related to appraisal knowledge, near-zero 
^ -correlations were found between appraisal knowledge and accuracy for 
both groups. Apparently, what raters actually do Jto achieve accuracy 
i« not necessarily what textbooks suggfest. This observation is. worth 
dwelling on a minute. Originally it was hypothesized that the more 
information a rater utilizes and the quicker he or she responds, the 
higher the accuracy. While rating behaviors were consistent with 
knowfedge of "good" appraisal techniques, they, were not correlated as 
hypothesized tp accuracy. These findings imply that our folk 
knowledge o£. "good appraisal" may be inaccurate and that we ne?d to 
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rethink what ratiU^ ^ehavi6rs wobld be expected to result in accurate 
rating. 

^Two research studies anderway concern ratejs' use of cues. The 
first examines whether a rater uses the single behavioral cues for 
evaluating TDore than one performance dimension. An earlier study 
which used a betw^een-subjects design showed that .some behavioral cues 
were salient across setreral dimensions (Banks, 1982). If ' a single 
rater utilizes the same cue across dimensions, then halo "error" may 
be reintepreted as a by-product of normal decision-making rather than 
the result of overgeneralized global impressions. That is, halo would 
be caused by the overlap in information used to generate dimension 
ratings. If halo error is in fact a problem of multiple-cue use, 
training programs to -reduce halo error may be more successful if 
raters- are trained to increase their reliance on more discriminating 

cues, % ' 

^ % • 

The second study in progress involves raters' Identification of • 

relevant cues. It is essentially a study of raters' ability to 

separate relevant .from irrelevant information. It is expected that 

those who utilize a high proportion of relevant information (to total 

information utilized) will be more accurate. 

In summary, a good deal of descriptive work on the rating process 

has been completed. We found that raters exhibit a rating style that 

is consistent across tasks in many respects, but some re\fision in this 

style occurs .with practice. We also found that accurate managers 

exhibit two different rating styles whereas accurate students exhibit 

only one. Reasons for these differences between styles were explored. 
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Nonetheless, in each case, specific ratirjg behaviors we're related to 
rating accuracy (though opposite .to expectatiori) . Finally, it was " 
found that while folk knowledge of "good" appraisal techniques 
related tovtatirtg behaviors in the hypothesized direction, appraisal " 
knowledge ^iled to correlate significantly with rating\ccuracy . The 
correlations between rating behaviors and rating accuracy and the lack 
of correlation between appraisal knowledge anS accuracy^ suggest that 
we need td revise our thinking about what kind of rating ^ehavior is 
related to accuracy. .And last, we concludedlthat tfhe IRJ procedure 
does not seem to interfere with^aters' cognilivfe processing and that 
IR/ yields for the most part,, reliable and valid data. 

Although a good deal of work is completed, more remains. This 
paper intended to show that rating process studies can be done, though 
sl9wly. This work suggests^ to me that the ratinfe process'' is quite 
complex and fraught with potential errors. Know ledge- gained from such 
work has opened up new avenues of thinking about appraisal *and how to 
reduce potential errors. Lpt me elaborrate on that point. 
^ > ; Typical appraisal systems apparently require raters to be good 
test deve^pers. The' only parts of the "test" a rater^is given to 
measu^a subordinate's work performance are the constructs to be 
measured and definitions of those constructs ' (with some hints ad to 
what items may be relevant— they are called behayioral anchors or 
examples). Raters are left with the problem of figuring out what 
items ^(sic. behaviors) should be observed to evaluate performance and 
what theit discriminating power id, how items should be scored and 
combined, and' finally how raw scores should be* interpreted.. No wonder 
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ineWperienc?*, untrained raters are unmotivated to do it when they do 
it ioorly and are held accountable! Taking "test development'5 out of" 
the appraisal process may improve rater . accuracy*. Training in 
assessment similar to assessor training i^ assessment, centers is 
another possible change. I will leave the reader to think of others. 

In conclusion, I belie\?e^we need to push ahead with ratiiiV-" 
process research to leard what variables affect fhe rating process jnd 
more important, Chich lead to accuracy so that we can begin to des-i^n 
, specific and potent interventions. * ' * 
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Footnotes 



1 ■ . • 

The literature's recommendations are considered folk knowledge since they 
are improven buf*eileved. . / • 

Since NJ Is correlated with LAT .79 for both samples, they will be used 
interchangeably in the. discussion. 
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* J ^ . * ' Table 1 

Po.< « "^^-^ Standard Deviations of Ratihg Behaviors, 
Ratteg Outco.es. and Individual Difference -Variables for Managers and Students 



I 



VARIABLES 

AVGNJ 
AVGSDJ 
AV6SD 
AVGLAT 
Halo 

Leniency^ 
Restriction of Range 
, Accuracy 

Embed Figures 

Bierl Cognitive Complexity 

Form 
Age 

Detail Orientation 
Task Orientation 
Intellectual Ability^ aAd 

Interest 
Personal Adjustment 
Realistic Theme 
Investigative Theme 
Artistic Theme 
Social Theme- 
Enterprising Theme 
Conventional Theme-^ 
Outgoing vs. Shy 
Adjusted vs. Malady 
Decisive vs. Indecisive 
Friendly vs. '^Unfriendly 
Interested In Others vs. 

Self-Absorbed 
Cheerful vs. Humored 
Dominant vs. Submissive 
Considerate vs. Inconsiderate 
CPI Tolerance Score 
CPI Well-Being Score 
CPI Stress Reduction Score 
Highest Education Level 
High School CPA (5 pt . scale) 
Importance of Appraisal 

Procedureli««s*i- 
Appraisal Knowledge Test 



MANAGERS 
X ■ ■ SD 



^ STUDENTS • 
SD^ 



Z Slgnlflcar^ce 



J,78 
• 69 
.19 

183,74 
l.QS 
3,60 
2.09 
1.03 

'13,08 
94.27 

32,6j5 
9,50 
8,50 

5.55 
36:27 
2.33 

2.83 
4.16 
4.63 
2.75 
^6.22 
6.86 
6.41 
6.94- 

6.88 
• 6.58 

5.88 

6.75 
13.05 
18. 6^. 

4.55 

3.75 

3.11 

71.25 ^ 
12.41 



'^1.86 
.24 
.04 

116.89 
.35 
.44 
.37 
.27 
4.18 
25.13 

14.31 
3.11 
3.21 

' 2.63 
12.34 
■1.51 
1.53 
1.76 
1.69 
1.86 
1.81 
2.52 
2.40 
2.41 , 
2.70 

2.40 

2.60 

2.57 

2,61 

5.05 

-6.58 

4.37 

2.07 

1.48 

23.58 
4.67 



3.58 
■ .75 
.18 
206.16' 
1.19 
3.72 
2.02 

lYe 

14.95 
89.15 



1.97 
.28 
.03 
98.6/ 
• .32 
.42 
.27 
'.28 
2.70 
11.52 



NS 

NS 

NS ' 

NS 

NS 

NS 

NS 

NS 
2.39 
4.76 



NS 
NS 
NS 
NS 
NS 
NS 

'ns 

NS 
.<046 
.001 





o. Uo 


3.15 




9.70 


2 -79 


Mo 


Mo 


8.40 


3.54 


NS 


* ,NS 


7.00 


2.44 


lio 


MO 
Mo 


36.95 


9.59 


NS 


NS 


2.50 


1.39 


NS 


NS 


2.80 


1.88 ■ 


NS 


NS 


3.20 


1.70 


NS 


NS 


3.90 


1.77 


NS 


NS 


*'4.70 


1.75 


NS 


NS 


2.85 


1.72 


NS 


NS 


6.35 


, 2.32 


NS 


NS 


7.00 


2.61. 


NS 


NS 


6.45 


2. '25 


NS 


NS 


7.65 


1.95 


NS 


NS 


7.20 


2.09 ' 


NS 


NS 


'■V5.95 


2.06 


NS 


NS 


6.55 


2.28 


NS 


NS 


7.50 


1.96 


"NS 


NS 


•12.95 


3.54 


NS 


NS 


18.35 


5.63 


NS 


NS 


5.65 ' 


3.82 


NS 


NS • 


3.55 


1.60 


NS 


NS 


4.20 


1.19 


NS 


NS 


62.10 


27.76 


NS 


NS 


13.25 


4.15 


NS 


NS 
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Table 2 



Correlations Between Rating Behaviors and Rating Outcomes 
for Managers and, Students ^ 

-— ^ ^ 

MANAGERS 



Rating 
Behaviors 


Rating Outcomes 


Halo 


\eniency 


Rest. Range 


Accuracy 


AVGNJ 


' .16 


-.35** 


.18 


.01' 


AVGSDJ 


.15 


-.05 


0 


-.01 


' AVGLAT 


.09 


.24 


-.15 


*-.12 


AVGSD 


.12 


.37** 


.84*** 


- -.03 




STUDENTS 


AVGNJ 


-.11 


.10 


.22 


-.53** 


AVGSDJ 


-.05 


.36 


-.22 


-.43** 


AVGLAT 


.23 


-.13 


.04 


.46** 


AVGSD 


.13 


.oi 


.66*** 


.13' 



*p<.05 

**p<.01 

***p<.06l 
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TA&LE 3 



Significant Correlations Between Rating Behaviors and Individual Difference 

Variables for Managers and Students 



Individual 
Difference' 

VARIABLES 



Rating Behaviors 



Managers 



-^^^ : 

AVGNJ AVgSDJ AVGLAT AVGSD 



Students 



AVGNJ AVGSDJ AVGLAT AVGSD 



E^^3F 

cc 

DET 

TA 

lA 

PA 

RT 

IT 

AT 

ST 

ET 

CT 

OUT 

ADJ 

DEC 

FRIEND 

OTHERS 

CHEER 

DOM 

CONSID ^ 

TOL 

WB 

SR » 

HED 

GPA 

IMP 

APP TEST 



.40 
.35 

.33 



.32 



.32" 



.28 



.29 

.35 
.45 



.40 
.46 



-.48 
-.42 
-.35 
-.44 

-.32 

-.35 
-.29 



-.32 

-.37 
-.34 



-.29 
'-.44 
-.32 

-.46 
-.30 
-.44 
-.49 



.41 



-.51 



-.38 



.41 



-.4-8- 



.-40 



-.42 

.40 
.48 

.44 



.37 
.38 
.47 
.38 



.38 



^See Table 1 for'complete names of variables listed in thlstkble. 
■ ^oirrelations are reported if p<. 05 or greater. " ^ 
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Figure 1 " 

Behavioral Constructs, Operations, and Variable Names 



CCMSTRUCT 

Degree of 
Information Utilization 

Sensitivity to 
Differences Between Ratees 

' Sensitivity to 
ratee Srengths apd Weaknesses 

Observational 
Style 



OPERATION 

Number of judgments made > for . 
foj each ratee 

Variation In m^n judgments 
for each ratee 

Variation In judgments, for 
each ratee - 



VARIABLE 



NJ 



SDJ 



SDJ 



Latency before first judgment LAT 
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Figure 2 

. ^ • • * ' Research Completed/In Progress 

* ♦ 

i 

I» >Descrlptlve Studies of. Rating Behavior * 
Study 1. Number and Kinds of Judgments 
Study 2. Cue Selection and "Evaluation 
Study 3. Stability of Rating Behavior Across Ratees 

!!• Construct Validation of IRJ 
^ A* Robustness of the Technique , 

Study 5. Generalizability of IRJ Results 
Study 4. Rate-rerate Reliability of Rating Behavior 
Study 6. , Impact of Reporting 

^ B. " Validation of Behavioral Data: Correlations with Various Rating Outco 

Study 7. ^ting Behavior and Rating Outcomes 
Study 8* . Rating Behavior and' Correlates of Accuracy 
(In Progress) Study 9. Multiple Cue Use and Halo Error 

(In Progress) Study 10. Identification of Relevant Cues and Accuracy ^ ' , 
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